t stands for time Top Level 

m/z for mass/charge ratio 

r stands for replicate or repetition, and is effectively sample number within a 
condition 
I = intensity 



Start 
with 
(t.m/z.r.1) 




Round t & m/z 






to user- 




Add intensities 


specified levels 




with t, m/z, r 


(levels may not 


— ► 


that are identical 


form regular 




after rounding 


grid) 







Align Data 

within 
conditions 
(Fig. 7A) 



100 



102 



Shift one data set 
relative to the other 
by time shift Ts 



110 



-Yes 



Find mean of 

selected 
measurements 



116 



For each t f r: Extract 
user-specified 
quantile range (typical 
90-95%ile or 75- 
95%ile) of intensity 
measurements, 
(across m/z) 



114 



Divide 
intensities for 
this (t,r) by that 
mean 




Get average 
(across r) at 
each t,m/z 



101 



For each 
possible shift 
in range 
calculate 
correlation of 
A, shifted B 
7" 



* 103 

Any cofrelatloq meet 
siprrfficance criteTi^? 

(currently 
.significantly different 
frbro correlation^ 0 

105 r 

Yes 



118 



120 



Yes 



Make list of 
sufficiently long (in 
time t) runs: m/z,start, 
finish 
(required length is 
user specified, 
currently 15s or 
characteristic small 
peak width in mass 
chromatogram) 



For each t,m/z,calculate p- 
value (or pseudo p-value) 
for difference between 
conditions: 
see "Finding 
Differences" (Fig. 2) 



Set time shift to 
give maximum 

correlation 
significantly>0 



For each entry on 
list, calculate (user 
specified) function 
f of p-values 
(currently f = 
sum(log(p-value)) ) 



124 



126 



Sort list by Up- 
values); most 
significant at top 



107 



For each m/z find 
contiguous cells (across 
t) with significant p- 
value (user specified, 
typically 0.05) 



122 



128 



Filter for false 
positives: see "Filter 
for False Positives I" 
(Fig. 4) or, for 
alternate method, 
"Filter for False 
Positives 111" (Fig. 5) 



132 



Output and 
iteration 
module 
(Fig- 6) 



134 




Resort list to 
group related 
signals: see 
"Grouping 
Results" (Fig. 3) 



130 



No Return sorted list with {or\ 
without) graphs 



138 



Fig. 1A 



t stands for time 
m/z for mass/charge ratio 

r stands for replicate or repetition, and is effectively sample number within a 
condition 
I = intensity 



Top Level 



Start with 
(t,m/z,r,l) 



100 





Round t & m/z 

to user- 
specified levels 

(levels may not 
form regular 
grid) 


— ► 


Add intensities 
with t, m/z, r that 
are identical after 
rounding 








102 



104 



Find mean of 

selected 
measurements 



116 



114 



For each t,r: Extract 
user-specified 
quantile range (typical 
90-95%ile or 75- 
95%ile) of intensity 
measurements, 
(across m/z) 



-Yes- 



112 



Divide 
intensities for 
this (t,r) by that 
mean 



118 



-No- 



Align Data 

across 
conditions 
(Fig. 7B) 



104B 



For each t t m/z,calculate 
p-value (or pseudo p- 
value) for difference 
between conditions: 

see "Finding 
Differences" (Fig. 2) 



For each m/z find 
contiguous cells (across 
t) with significant p- 
value (user specified, 
typically 0.05) 



120 



122 



Make list of 
sufficiently long (in 
time t) runs: m/z,start, 
finish 
(required length is 
user specified, 
currently 15s or 
characteristic small 
peak width in mass 
chromatogram) 

124 



For each entry on list, 

calculate (user 
specified) function f of 
p-values (e.g. f = 
sum(log(p-value)) ) 



Sort list by Up- 
values); most 
significant at top 



Resort list to group 
related signals: 
see "Grouping 
Results" (Fig. 3) 



126 



128 



130 



Filter for false positives: 

see "Filter for False 
Positives III" (Fig. 8) or, 
for alternate method, 
"Filter for False 
Positives I" (Fig. 4) 

132B ~~ 



Show user sorted list of differences. 
Make comparative graph of average 
signal +/- deviation for each element 
on list. Time range shown is 3x width 
of point-by-point significant difference 
width 




134B 



Fig. 1B 



Return sorted list with (or\ 
without) graphs J 

138 



Start with gridded, 
(optionally) 
shifted, 
normalized (t,m/ 

z,r,l) 
measurements 



A and B represent 
two experimental 
conditions to be 
compared 



Finding Differences 



200 



202 



For each (A/B,t,m/z) 
find mean and variance 
of I across r 
refer to means as 
"average signal" later 



Other possible 
methods: Wilcoxon, 
etc. 

+ 




Yes 



Convert mean and variance of 
measured intensities to mean 
and variance of imputed 
logarithmic distribution 



( cr 2 ^ 
<T 2 iog =log l + ~y 



Miot =log(//)- 



_2 

a log 



206 



Method 1: t-test- 





Calculate difference in 




means for conditions , 




A and B at each t,m/z. 


208 


Call it D 



Method 2: cumulative distribution fn. 



Calculate standard error of 
difference in means: 



210 



S 



is standard error of mean 
for A, similarly for B 



Put list of D*s in order from 

smallest to largest: 
D(1),D(2),....D(N) where 
N=number of cells (number of 
(t,m/z) pairs) 



L 



In each (t, m/z) bin, the 
p-value is the quantile of 

Dla D 

in a t-distribution with F 
degrees of freedom 



212 



F = 



Si 



<? 4 



1 n B -l 



which includes the 
Welch correction to 
degrees of freedom for 
unequal variances 



203 






r 


The pseudo p-value 
corresponding to D(i) is i/N. 
Assign this pseudo p-value 
to the appropriate (t, m/z) 
bin. 


205 





Return 



Return 



»14 



Fig. 2 



207 



Start with ranked 
list of results of the 
form 
(m/z.start, end) 




Grouping Results 



For each m/z calculate list of base masses 
M(z) = z* (m/z - 1) for z= 1,2,3,..., k 
currently k=4, as higher charge states are rare 



300 



T 



302 



for each (m/z,start,end), create 
a list of group members initially 
containing only itself 



304 



Make list of all 
pairs of (m/ 
z,start,end) triples 



310 



Take next pair M(z)_1, 
M(z)_2 and form list of 
all k A 2 pairwise 
differences 




No 



This process finds ions with (putatively) 
different charge states (if difference is 
0) and putative isotopes or binning 
errors (if difference is not zero but less 
than resolution). Theoretically it could 
group things that should not be 
grouped, but this would not be 
particularly harmful, as the list could 
easily be re-sorted by f(p-value) and 
group labels could be dropped as 
needed. 




Yes 



Re-sort ranked list. 
Each (unique) group is 

placed at rank of 
highest-ranking of its 
members in old list. 

322 




Add each to 
the others' list 
of group 
members 



318 



Leave Pair 
marked as 
ungrouped. 



320 



324 



Fig. 3 



Looking for "full shift" 
differences that appear 

to be due only to a 
relative time shift in the 
histograms. These are 

where we see A>B 
closely followed (in time 
at same m/z) by B<A 
(or vs vr) 



Start with ranked list: 
(m/z,start,end); and 
average signals for 
conditions A and B 



Filter for False Positives I 



400 



For any pair of 
results with the 
same m/z 



412 



Take signals over time range 
T starting at earliest 
significant difference and 
ending at latest (of the pair 
being considered) 



Yes 



414 



Determine time shift S giving 
maximum correlation between 
averaged mass chromatogram 
for A and that for B over each 
relative shift from -T/4 to +T/4 in 
increments of one time bin 



416 




DonT group 



406 



404 



Yes 



Shift signal A or B in time by S 
to give best alignment, 
dropping non-overlapping 
portions of signal, (remaining 
overlap is T-S long) 



418 



Check for significant 
difference between these 
shifted signals using same 
method as in "Finding 
Differences" 



is the^ericJ of earlier sighal.close 
3ugh (user specified, currer 
1 5 sec, chosen based on 
preliminary visual inspection of 
results from previous step) to^feift 
of later? 

408 



420 

1s there a significant 
difference between 
sttie shifted signals^ 



-No 



DonT group 



410 



No 



Group; label as 
^ possible false 
positive. 



-Yes- 



422 



Group; assign 
new p-value 
according to 
difference in 
levels 



424 



Yes 



Record difference: 
as an observed shift 



Can also look for minimal difference in AUC (area 
under curve) or least squares difference or least 

significance by measure used to find signals, and 

can use unbinned signals and/or weighted 
averages based on measures for different shifts 

to get a continuous estimate of S Fig. 4 



426 



430 




f 

f 



Filter for False Positives II 



Looking for differences 
that appear to be half a 
shift (see "Filter for false 
positives I") where other 
half was not statistically 
significant 



n is user specified, 
currently n=2 seems to 

work well, 
arrived at by inspection 
of preliminary results 



Start with ranked 
list: (m/z,start,end) 
and average 
signals NOT 

grouped by part I 



500 



For each 
difference in list 



502 



504 



oes either average 
signal have a peak of 
span 2n+1? (i.e. an 
intensity greater than that 
of n neighbors on both^ 
sides?) 



Yes 



No 



508 



Find slope "b" of 
least squares line: 

/„ =a+bt 
for both conditions 



506 



514 



Yes- 



Label as false 
positive 



Are both slopes 
'significantly different fronrT 
zero? (standard linear 
regression, p=0.05) 



Yes 



ie ratio of slopes (bigfc 
smaller) < threshold? 
(user specified, current value 3, 
determined by inspection of^ 
preliminary results)^ 

512 



Yes 



No 



Don't label as 
false positive 



510 



Find 



from regression line for each 
condition 



Record difference: 
as an observed shift 



518 




520 



516 



Return 



Fig. 5 



522 



Output and Iteration 



Start with List 
of differences 

filtered for 
false positives 




Make comparative 
graph of average 

signal +/- deviation 
for each element 

on list. Time range 
shown is 3x width 
of point-by-point 

significant 
difference width 



612 



618 



Calculate Ts as median 
of all recorded shifts 
from filtering for false 
positives 




► 



620 



Fig. 6 



r 

f 



Align Data within Conditions 



/Start with 
rounded 
(t,m/z,r,l) 

700 

I 

Calculate base peak 
chromatogram (or 
some other summary 
of intensities across 
m/z at each time) for 
each replicate 

f 702 
m_ 



Use dynamic time warping 

to align base peak 
chromatograms (or other 
representation), obtaining 
time shift functions for 
each replicate 




r- Binned data 



Apply time shift 
functions to 
binned data 




Original data- 



706 



Apply time shift 
functions to 
original data 



708 



Repeat steps 
102,104 



710 



c 



Return 

712 



Fig. 7A 



Align Data Across Conditions 





Find landmarks 
(Fig. 7C) 






r 




Filter landmarks 
(Fig. 7D) 






r 


Shift landmarks to align 
data; interpolate between 
landmarks. 




f 



902 



904 



Return 



Fig. 7B 



r 

t 

Find Landmarks 



Find user specified 

percentile of 
values (across all 
data sets) 






1000 

f 




Find peaks rising 
above the specified 
percentile in each 
data set 






1002 

f 


Foreach m/z: if not 
same number of 
peaks for each data 
set, discard all peaks 
for this m/z 






r 


1004 


Fit each remaining 
peak with a cubic. 
Location of max of the 
fit cubic is location of 
the landmark 






r 


1006 


Optionally, assign 2 
additional landmarks 
at positions of V2 
height (above 
percentile cutoff) of 
the fit cubic 




1 


r 


1008 



Return 



Fig. 7C 



r 



Filter Landmarks 



Select a data set to 
be the base (default: 
just take 1 st set) 



1100 



Label all landmarks 
with shift for 
corresponding 
landmark in base 
set. 



E.g. if base landmark is 
at time 25 minutes and 
corresponding 
landmark is at time 
24.5, 25, or 25.5, then 
the shift is -0.5,0, or 0.5 
respectively 



1102 



Set to 0 shifts < a 
user defined 
minimum (default = 2 
minutes in either 
direction) 



1104 



Set any shift not in a 
run of shifts of the 
same sign to 0. 



1106 




Fig. 7D 



Start with ranked list of 
differences (m/z,start, 
end) and mean and std 
error of mean of signals 
for conditions A and B 



For each 
difference 



Filter for False Positives III 



Take signal in time range 
tmin -w to t max + w, 
where t_min and t_max 
are the start and end of 
the region of significant 
difference and w=t_max- 
t min 



Test explained on pp. 
188-189 of Snedecor & 
Cochran, Statistical 
Methods 8 th edition 
1989 Iowa State 
University Press 




Label as likely 
false positive 



818 



Fig. 8 



For each 
difference 



Estimating Relative Amounts 



-No- 




-Yes- 



Region of 
interest = 
Region of 
significance 



Find time range of 
peak surrounding 
region of significance 
Set region of 
interest to this time 
range 




Yes 



Find area under 
curve in region of 
interest for each 
condition 



Find difference at 

each point in 
region of interest 



Use: 

AUCjcondl) 
AUC(cond2) 



Use: 

exp(mean(differences in logs)) 




Fig. 9 




Fig. 10 
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Fig. 13 
BEST AVAILABLE COPY 
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Figure 14A. No significant difference between sample sets A 
and B in means by t-test: p-value 0.131 
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Figure 14B. No significant difference between sample sets A 
and B in means by t-test: p-value 0.0506 
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Figure 14C. Significant difference between sample sets A and 
B in means by t-test: p-value 0.0499 
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Figure 14D. Significant difference between sample sets A and 
B in means by t-test: p-value 0.0194 
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